IN THE CLAIMS 

1 . (Previously Presented) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
having a set of L control elements, wherein the first operand and second 
operand are of a same size and each of the L data elements and L control 
elements are of a same size, and wherein each one of the L control 
elements is divided into three portions, the first portion being a flush to 
zero bit occupying the most significant bit of each control element, the 
second portion being a position selection field that is at least log 2 L bits 
wide and indicates a position of one of said L data elements, and a third 
portion, storing a resultant operand in said first register having L resultant 
data elements of the same size as the L data elements and the L control 
elements, wherein the value of each resultant data element is controlled by 
the position selection field of the L control elements in the same position 
as the resultant data element, and is either, 

the one of the L data elements designated by the position selection field of 
said control element if said control element's flush to zero bit is 
not set; or 

a zero if said control element's flush to zero bit is set. 

2. (Cancelled) 
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3. (Cancelled) 

4. (Previously Presented) The method of claim 1 wherein said control element is to 
designate a first operand data element by a data element position number. 

5. (Cancelled) 

6. (Cancelled) 

7. (Previously Presented) The method of claim 1 further comprising outputting a 
resultant data block comprising data that was shuffled from said first operand in response 
to said control elements of said second operand. 

8. (Original) The method of claim 1 wherein each of said data elements comprises a 
byte of data. 

9. (Original) The method of claim 8 wherein each of said control elements is a byte 
wide. 

10. (Original) The method of claim 9 wherein L is 8 and wherein said first operand, 
said second operand, and said resultant are each comprised of 64-bit wide packed data. 
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1 1 . (Original) The method of claim 9 wherein L is 16 and wherein said first operand, 
said second operand, and said resultant are each comprised of 128-bit wide packed data. 

12. (Previously Presented) An apparatus comprising: 

an execution unit to execute a single packed shuffle instruction designating, with 
3 bits, a first register storing a first operand comprised of a set of L data 
elements and designating, with 3 bits, a second register storing a second 
operand comprised of a set of L control elements, wherein the first 
operand and second operand are of a same size and each of the L data 
elements and L control elements are of a same size, and wherein each one 
of the L control elements is divided into three portions, the first portion 
being a flush to zero bit occupying the most significant bit of each control 
element, the second portion being a position selection field that is at least 
log 2 L bits wide and indicates a position of one of said L data elements, 
and a third portion, said shuffle instruction to cause said execution unit to 
store a resultant operand in said first register having L resultant data 
elements of the same size as the L data elements and the L control 
elements, wherein the value of each resultant data element is controlled by 
the position selection field of the L control elements in the same position 
as the resultant data element, and is either, 
a zero if said control element's flush to zero bit is true, otherwise 
the one of the L data elements designated by the position selection field of 
said individual control element. 
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13. (Original) The apparatus of claim 12 wherein each of said L control elements 
occupies a position in said second operand and is associated with a similarly located data 
element position in a resultant. 

14. (Original) The apparatus of claim 13 wherein each individual control element is to 
designate a first operand data element by a data element position number. 

15. (Cancelled) 

16. (Cancelled) 

17. (Previously Presented) The apparatus of claim 12 wherein said shuffle instruction 
is to further cause said execution unit to generate a resultant having L data element 
positions that have been filled based on said set of L control elements. 

18. (Original) The apparatus of claim 12 wherein each of said data elements 
comprises a byte of data and each of said control elements is a byte wide. 

19. (Original) The apparatus of claim 18 wherein L is 8 wherein said first operand, 
said second operand, and said resultant are each comprised of 64-bit wide packed data. 

20. (Original) The apparatus of claim 18 wherein L is 16 and wherein said first 
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operand, said second operand, and said resultant are each comprised of 128-bit wide 
packed data. 

21 . (Currently Amended) An article of manufacture comprising a machine readable 
storage medium that stores data, that when accessed by a machine, causes the machine to 
perform operations comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
having a set of L control elements, wherein the first operand and second 
operand are of a same size and each of the L data elements and L control 
elements are of a same size, and wherein each one of the L control 
elements is divided into three portions, the first portion being a flush to 
zero bit occupying the most significant bit of each control element, the 
second portion being a position selection field that is at least log2L bits 
wide and indicates a position of one of said L data elements, and a third 
portion, storing a resultant operand in said first register having L resultant 
data elements of the same size as the L data elements and the L control 
elements, wherein the value of each resultant data element is controlled by 
the position selection field of the L control elements in the same position 
as the resultant data element, and is either, 

the one of the L data elements designated by the position selection field of 
said control element if said control element's flush to zero bit is 
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not set; or 

a zero if said control element's flush to zero bit is set. 

22. (Currently Amended) The article of manufacture of claim 21 wherein said data 
stored by said machine readable storage medium represents an integrated circuit design, 
which when fabricated performs said predetermined function in response to a single 
instruction. 

23. (Currently Amended) The article of manufacture of claim 22 wherein said 
machine readable storage medium further includes data, that causes the machine to 
perform operations further comprising: 

generating a resultant having L data element positions that been filled in 
accordance to said set of L control elements. 

24. (Previously Presented) The article of manufacture of claim 23 wherein each of 
said L control elements is associated with a similarly located data element position in a 
resultant. 

25. (Previously Presented) The article of manufacture of claim 24 wherein each 
individual control element is to designate a first operand data element by a data element 
position number. 

26. (Previously Presented) The article of manufacture of claim 25 wherein each of 
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said data elements comprises a byte of data. 

27. (Cancelled) 

28. (Cancelled) 

29. (Currently Amended) The article of manufacture of claim 21 wherein said data 
stored by said machine readable storage medium represents a computer instruction, 
which, if executed by a machine, causes said machine to perform said predetermined 
function. 

30. (Previously Presented) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
having a set of L masks, wherein the first operand and second operand are 
of a same size and each of the L data elements and L masks are of a same 
size, and wherein each one of the L masks is divided into three portions, 
the first portion being a flush to zero bit occupying the most significant bit 
of each control element, the second portion being a position selection field 
that is at least log2L bits wide and indicates a position of one of said L data 
elements, and a third portion, and wherein each of said L masks occupies a 
particular position in said second operand and is associated with a 
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similarly located data element position in a resultant operand, storing the 
resultant operand in said first register having L resultant data elements of 
the same size as the L data elements and the L masks, wherein the value of 
each resultant data element is controlled by the position selection field of 
the L masks in the same position as the resultant data element, and is 
either, 

a zero if said mask's flush to zero bit is set; or 

if said mask's flush to zero bit is not set, the one of the L data elements 
designated by the position selection field of said mask to said 
associated resultant data element position. 



31.-33. (Cancelled) 



34. (Previously Presented) The method of claim 30 wherein said first operand, said 
second operand, and said resultant are each comprised of 64-bit wide packed data. 



35. (Previously Presented) The method of claim 30 wherein said first operand, said 
second operand, and said resultant are each comprised of 128-bit wide packed data. 



36. (Previously Presented) A method comprising: 

responsive to receiving a single packed shuffle instruction designating, with 3 

bits, a first register storing a first operand having a set of L data elements 
and designating, with 3 bits, a second register storing a second operand 
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having a set of L shuffle masks, wherein the first operand and second 
operand are of a same size and each of the L data elements and L masks 
are of a same size, and wherein each one of the L shuffle masks is divided 
into three portions, the first portion being a flush to zero bit occupying the 
most significant bit of each control element, the second portion being a 
position selection field that is at least log2L bits wide and indicates a 
position of one of said L data elements, and a third portion, and wherein 
each of said L shuffle masks is associated with a similarly located data 
element position in a resultant operand, storing the resultant operand in 
said first register having L resultant data elements of the same size as the 
L data elements and the L masks, wherein the value of each resultant data 
element is controlled by the position selection field of the L individual 
masks in the same position as the resultant data element, and is either, 
a zero if said mask's flush to zero bit is set, otherwise 
the one of the L data elements designated by the position selection field of 

said individual shuffle mask to said associated resultant data 

element position. 

37. (Cancelled) 

38. (Cancelled) 

39. (Previously Presented) An apparatus comprising: 
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a first memory location to store a plurality of source data elements; 

a second memory location to store a plurality of control elements, each of said 
control elements to correspond to a resultant data element position, and 
wherein each one of said control elements is divided into three portions, 
the first portion being a flush to zero bit occupying the most significant bit 
of each control element, the second portion being a position selection field 
that is at least log2L bits wide and indicates a position of one of said L data 
elements, and a third portion; 

control logic coupled to said first memory location and said second memory 
location, said control logic in response the receipt of a single packed 
shuffle instruction designating, with three bits, a first memory location 
storing a first operand having a set of L data elements and designating a 
second memory location storing a second operand having a set of L 
control elements, wherein the first operand and the second operand are of 
a same size and each of the L data elements and L control elements are of 
a same size, to generate a plurality of selection signals and a plurality of 
flush to zero signals, a zero signal generated when a control element's 
flush to zero bit is set; 

a first plurality of multiplexers coupled to said first memory location and said 

plurality of selection signals, each of said first plurality of multiplexers to 
store a resultant operand in said first memory location having L resultant 
data elements of the same size as the L data elements and the L control 
elements, wherein the value of each resultant data element is controlled by 
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the position selection signal of the L control elements in the same position 
as the resultant data element, and is the one of the L data elements for a 
specific resultant data element position in response to a selection signal 
corresponding to said specific resultant data element position; and 
a second plurality of multiplexers coupled to said first plurality of multiplexers 
and to said plurality of flush to zero signals, each of said second plurality 
of multiplexers associated with a specific resultant data element position, 
each of said second plurality of multiplexers to output a zero if its flush to 
zero signal is active or to output a data element shuffled for that specific 
resultant data element position. 

40. (Original) The apparatus of claim 39 wherein said plurality of source data 
elements is a first packed data operand. 

41 . (Original) The apparatus of claim 40 where said plurality of control elements is a 
second packed data operand. 

42. (Original) The apparatus of claim 40 wherein said first and second memory 
locations are a single instruction multiple data registers. 

43. (Original) The apparatus of claim 42 wherein: 

said first packed operand is 64 bits long and each of said source data 
elements is a byte wide; and 



Appl. No. 10/611,344 



12 



Atty. Docket No. 42P15762 



said second packed operand is 64 bits long and each of said control 
elements is a byte wide. 

44. (Original) The apparatus of claim 42 wherein: 

said first packed operand is 128 bits long and each of said source data 

elements is a byte wide; and 
said second packed operand is 128 bits long and each of said control 

elements is a byte wide. 

45. (Previously Presented) An apparatus comprising: 

control logic to receive a single packed shuffle instruction designating, with three 
bits, a first memory location storing a first operand having a set of M data 
elements and designating, with three bits, a second memory location 
storing a second operand having a set of L shuffle masks, wherein each of 
the M data elements and L shuffle masks are of a same size, and wherein 
each one of the L shuffle masks is divided into three portions, the first 
portion being a flush to zero bit occupying the most significant bit of each 
shuffle mask, the second portion being a position selection field that is at 
least log 2 L bits wide, and a third portion, and wherein each shuffle mask is 
associated with a unique resultant data element position controlled by the 
position selection field of said shuffle mask, said control logic to provide a 
select signal and a flush to zero signal for each resultant data element 
position; 
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a set of L multiplexers coupled to said control logic, wherein each multiplexer is 
also associated with a unique resultant data element position, each 
multiplexer to output to said first memory location either, 
a zero if said shuffle mask's flush to zero signal is active or the one of the 
M data elements designated by the select signal of said shuffle 
mask if said shuffle mask's flush to zero signal is not inactive. 

46. (Original) The apparatus of claim 45 further comprising a register with L unique 
data element positions, each data element position to hold an output from its associated 
multiplexer. 

47. (Original) The apparatus of claim 46 wherein L is 16 and M is 16. 

48. (Previously Presented) A system comprising: 
a memory to store data and instructions; 

a processor coupled to said memory on a bus, said processor operable to perform 
a shuffle operation, said processor comprising: 
a bus unit to receive a single packed shuffle instruction, from said 

memory, said instruction to designate, with 3 bits, a first register 
storing L data elements from a first operand, and to designate, with 
three bits, L shuffle control elements from a second operand, 
wherein the first operand and second operand are of a same size 
and each of thee L data elements and L control elements are of a 
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same size, and wherein each one o the L control elements is 
divided into three portions, the first portion being a flush to zero 
bit occupying the most significant bit of each control element, the 
second portion being a position selection field that is at least log2L 
bits wide and indicates a position of one of the L data elements, 
and a third portion; 

an execution unit coupled to said bus unit, said execution unit to execute said 

single packed shuffle instruction, said single packed shuffle instruction to 
cause said execution unit to: 

store a resultant operand in said first register having L resultant data 

elements of the same size as the L data elements and the L control 
elements, wherein the value of each resultant data element is 
controlled by the position selection field of the L control elements 
in the same position as the resultant data element, and is either, 
the one of the L data elements designated by the position selection 
field of said control element if said control element's flush to zero 
bit is not set; or 

a zero if said control element's flush to zero bit is set. 



49. -51. (Cancelled) 



52. (Original) The system of claim 48 wherein each data element is a byte wide, each 
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4> 

shuffle command element is a byte wide, and L is 8. 

53. (Original) The system of claim 48 wherein said first operand is 64 bits long and 
said second operand is 64 bits long. 
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